Hierarchy in Web Page Similarity Link Analysis
نویسنده
چکیده
Rather than using traditional text analysis to discover Web pages similar to a given page, we investigate applying link analysis. Since web pages exist in a link-rich environment, that has the potential to relate pages by any property imaginable — since links are not restricted to intrinsic properties of the page text or metadata. In particular, while Web page similarity link analysis has been explored, prior work has deliberately ignored the explicitly hierarchical host & pathname structure within URLs. To exploit this property, we generalize Kleinberg’s well-known “hubs and authorities” HITS algorithm; adapt this algorithm to accommodate hierarchical link structure; test some sample web queries; and argue that the results are potentially superior and that the algorithm itself is better motivated.
منابع مشابه
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis
In this paper, different types of web session similarity metrics are compared and combined for better web session clustering. Syntactic and co-occurrence information are used for similarity calculation. Syntactic information on a web page includes the place of the page in the directory hierarchy. Co-occurrence information is the amount of the occurrences of two web pages in the same sessions. V...
متن کاملAn Iterative Link-based Method for Parallel Web Page Mining
Identifying parallel web pages from bilingual web sites is a crucial step of bilingual resource construction for crosslingual information processing. In this paper, we propose a link-based approach to distinguish parallel web pages from bilingual web sites. Compared with the existing methods, which only employ the internal translation similarity (such as content-based similarity and page struct...
متن کاملShear-Flexural Interaction in Analysis of Reduced Web Section Beams using VM Link Element
Reduced web section beams in shear-yielding moment-resistant steel frames are used for energy dissipating of earthquakes. The finite element analysis indicates that failure mode of these beams are governed by the combination of shear force and flexural moment. Therefore the analysis of frames with reduced web section beams needs consideration of shear-flexural interaction in those sections. In ...
متن کاملMFCRank: A Web Ranking Algorithm Based on Correlation of Multiple Features
This paper presents a new ranking algorithm MFCRank for topic-specific Web search systems. The basic idea is to correlate two types of similarity information into a unified link analysis model so that the rich content and link features in Web collections can be exploited efficiently to improve the ranking performance. First, a new surfer model JBC is proposed, under which the topic similarity i...
متن کاملHierarchical Web-Page Clustering via In-Page and Cross-Page Link Structures
Despite of the wide diversity of web-pages, web-pages residing in a particular organization, in most cases, are organized with semantically hierarchic structures. For example, the website of a computer science department contains pages about its people, courses and research, among which pages of people are categorized into faculty, staff and students, and pages of research diversify into differ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006